The  Importance  of  Lexicalized  Syntax  Models 
for  Natural  Language  Generation  Tasks 


Hal  Daume  III,  Kevin  Knight,  Irene  Langkilde- Geary,  Daniel  Marcu  and  Kenji  Yamada 

Information  Sciences  Institute 
Computer  Science  Department 
University  of  Southern  California 
4676  Admiralty  Way,  Suite  1001 
Marina  del  Rey,  CA  90292 

{hdaume, knight, ilangkil, marcu, kyamada}@isi . edu 


Abstract 

The  parsing  community  has  long  recog¬ 
nized  the  importance  of  lexicalized  mod¬ 
els  of  syntax.  By  contrast,  these  models 
do  not  appear  to  have  had  an  impact  on 
the  statistical  NLG  community.  To  prove 
their  importance  in  NLG,  we  show  that  a 
lexicalized  model  of  syntax  improves  the 
performance  of  a  statistical  text  compres¬ 
sion  system,  and  show  results  that  suggest 
it  would  also  improve  the  performances  of 
an  MT  application  and  a  pure  natural  lan¬ 
guage  generation  system. 

1  Introduction 

We  distinguish  between  three  types  of  language 
models: 

•  n-gram  language  models  look  only  at  word  se¬ 
quences  to  gauge  the  quality  of  a  sentence. 

•  Non-lexicalized  syntax  models  consider  only 
syntactic  structure  down  to  the  level  of  word 
tags  in  assessing  the  grammaticality  of  a  sen¬ 
tence.  A  PCFG  is  an  example  of  such  a  model. 

•  Lexicalized  syntax  models  take  into  account 
both  sentence  syntax  and  lexical  values  when 
determining  the  quality  of  a  sentence. 

The  parsing  community  has  long  recognized 
the  importance  of  lexicalized  models  of  syntax 
for  building  robust  natural  language  applications. 
For  example,  Charniak  (1997)  showed  that  by  us¬ 
ing  a  lexicalized  syntax  model  instead  of  a  non- 
lexicalized  PCFG  syntax  model  trained  on  Penn 


Treebank  data,  one  can  increase  the  performance  of 
a  syntactic  parser  from  73.75%  labeled  recall  and 
precision  to  83.75%.  More  sophisticated  lexicalized 
models  of  syntax  (Collins,  1997;  Charniak,  2000) 
have  increased  the  performance  of  syntactic  parsers 
to  90%  labeled  recall  and  precision.  Lexicalized 
models  of  syntax  have  been  also  proven  useful  in 
speech  recognition  (Chelba  and  Jelinek,  1998)  and 
language  modeling  (Charniak,  2001;  Roark,  2001). 

By  contrast,  lexicalized  models  of  syntax  do  not 
appear  to  have  had  an  impact  on  the  statistical  NLG 
community.  Langkilde  and  Knight  (1998),  for  ex¬ 
ample,  use  an  n-gram  model  to  select  between  dif¬ 
ferent  lexical  renderings  of  a  meaning  representa¬ 
tion.  Knight  and  Marcu  (2000)  use  a  combination 
of  bigram  and  context  free  probabilities  to  select  be¬ 
tween  sentence  compressions.  To  our  knowledge, 
the  only  NLG  work  that  resonates  with  the  work  in 
parsing,  speech  recognition,  and  language  modeling 
is  that  of  Bangalore  and  Rambow  (2000),  who  show 
that  a  statistical  generation  system  that  uses  a  lexi¬ 
calized  hierarchical  model  of  syntax  outperforms  a 
system  that  uses  a  random  model. 

Given  the  small  interest  in  exploiting  lexicalized 
models  of  syntax  in  NLG,  we  may  conclude  that 
such  models  have  no  role  to  play  in  this  area.  In 
this  paper,  we  show  that  this  is  not  the  case.  To 
prove  the  importance  of  lexicalized  models  of  syn¬ 
tax  in  NLG,  we  focus  on  three  distinct  tasks,  each 
involving  a  generation  component.  First,  we  show 
that  a  lexicalized  model  of  syntax  can  improve  the 
performance  of  a  statistics-based  text  compression 
system.  Second,  we  show  that  a  lexicalized  model 
of  syntax  may  improve  the  outputs  of  a  Chinese-to- 
English  machine  translation  system.  Finally,  we  an¬ 
alyze  the  results  of  a  pure  natural  language  genera- 
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tion  system.  ^ 

In  each  experiment,  we  assess  the  impact  that  a 
lexicalized  model  of  syntax  may  have  on  improving 
the  quality  of  existing  systems  using  an  off-the-shelf 
component:  the  parser  built  by  Charniak  (2000). 

Our  use  of  Charniak’s  parser  provides  for  a  loose 
coupling  of  a  lexicalized  model  of  syntax  with  a  gen¬ 
eration  system,  which  is  far  from  ideal.  Neverthe¬ 
less,  these  experiments  provide  evidence  that  lex¬ 
icalized  models  of  syntax  can  improve  the  quality 
of  the  outputs  of  statistics-based  generators.  Our 
results  motivate  work  aimed  at  building  generation 
systems  that  choose  between  possible  renderings  of 
the  same  meaning  using  not  only  n-gram  proba¬ 
bilistic  models,  but  lexicalized  models  of  syntax, 
such  as  those  proposed  by  Collins  (1997)  and  Char¬ 
niak  (2000). 

2  Statistical  Summarization 

2,1  Document  Compression 

To  assess  the  impact  that  lexicalized  models  of  syn¬ 
tax  may  have  on  the  task  of  summary  generation,  we 
used  a  noisy-channel  document  compression  system 
(Daume  III  and  Marcu,  2002),  which  generalizes  the 
sentence  compression  system  developed  by  Knight 
and  Marcu  (2000)  to  the  document  level. 

In  Daume  and  Marcu ’s  system,  possible  docu¬ 
ment  compressions  are  packed  into  a  shared-forest 
structure  which  contains  explicit  channel-model 
probability  scores.  These  channel  probabilities  are 
based  on  discourse  PCFG  probabilities  and  syntac¬ 
tic  PCFG  probabilities,  as  well  as  compression  prob¬ 
abilities.  None  of  these  probabilities  are  lexicalized. 

We  then  use  a  generic  forest  ranker  which  com¬ 
bines  these  channel  probabilities  with  bigram-based 
source  model  probabilities  (Langkilde,  2000)  to  ex¬ 
tract  the  top  scoring  compression.  The  only  point 
at  which  lexicalization  enters  the  model  is  in  the 
bigram-based  source  model.  This  leads  to  the  gen¬ 
eration  of  many  poor  syntactic  structures.  For  in¬ 
stance,  the  model  does  not  know  the  difference  be¬ 
tween  transitive  and  intransitive  verbs,  and  therefore 
will  often  “compress  off”  the  objects  of  transitive 
verbs. 

'We  are  grateful  to  Eugene  Charniak  for  suggesting  these 
experiments  to  us. 


After  the  system  was  constructed,  we  evaluated  its 
performance  on  16  documents  chosen  from  the  Wall 
Street  Journal  portion  of  the  Penn  Treebank,  each 
containing  between  41  and  87  words^.  We  also  eval¬ 
uated  it  on  5  documents  selected  from  the  Mitre  cor¬ 
pus  (Hirschman  et  ah,  1999),  each  document  con¬ 
taining  between  64  and  91  words.  We  used  two  cor¬ 
pora  to  see  whether  the  system’s  performance  varied 
with  text  genre.  In  the  evaluations,  the  system  out¬ 
performed  a  baseline  system  presented  by  Knight  & 
Marcu  (2000),  applied  iteratively.  However,  it  still 
made  many  errors  which  would  not  have  been  made, 
had  it  had  a  good  grasp  of  English  grammar.  We  per¬ 
formed  an  exhaustive  error  analysis  on  the  system  to 
see  where  it  could  be  improved. 

2.2  Error  Analysis 

After  analyzing  the  errors  the  system  made,  we  clas¬ 
sified  them  into  ten  classes,  each  listed  in  Table  1 
with  an  example  of  the  error  and  an  example  of  this 
error  fixed. 

We  then  tabulated  the  frequency  with  which  these 
error  occurred  in  the  summaries  produced  by  the 
system.  They  are  shown  in  Table  2. 

Three  of  these  error  classes  can  be  consid¬ 
ered  grammaticality  errors:  Det,  Mod,  Comp, 
No  Verb  and  NuM.  These  three  alone  account  for 
37  errors  out  of  a  total  of  57  errors.  It  seemed 
that  the  integration  of  a  lexicalized  model  of  syn¬ 
tax  into  the  source  model  would  easily  remove  all 
of  these  problems.  Since  the  document  compression 
system  was  able  to  output  an  n-best  list  of  possible 
compressions,  we  were  able  to  rerank  this  list  using 
Charniak’s  parser. 

2.3  Syntax-Based  Reranking 

To  do  this  reranking,  we  modified  Charniak’s  (2000) 
parser  so  that  instead  of  outputting  the  optimal 
parse  tree  for  a  given  sentence,  it  outputs  its  non- 
normalized  maximum  entropy  score.  We  then  ran 
the  modified  parser  on  the  1000  best  compressions 
according  to  the  bigram  model,  normalized  the  prob¬ 
abilities  by  length  and  chose  the  single  best  com¬ 
pression. 

^Because  there  are  an  exponential  number  of  summaries  that 
can  be  generated  for  any  text,  the  decoder  runs  out  of  memory 
for  longer  documents;  therefore,  we  selected  shorter  subtexts 
from  the  original  documents. 


Grammatical  Errors 


Det  -  A  noun  or  noun  phrase  is  missing  a  determiner. 

Erroneous  “El  Paso  owns  and  operates  refinery.” 

Eixed  “El  Paso  owns  and  operates  a  refinery.” 

Comp  -  The  complement  of  a  verb  or  noun  is  missing,  rendering  the  output  ungrammatical. 

Erroneous  “Banco  Exterior  was  run  by  politicians  who  lacked  the  skills  or  the  will.” 

Eixed  “Banco  Exterior  was  run  by  politicians  who  lacked  the  skills  or  the  will  to  do  . . .  .” 

NUM  -  Often  if  the  original  document  contains  a  percentage  (for  instance  “5%”),  the  number  will  be  dropped  without  the  percent¬ 
age  sign. 

Erroneous  “The  rate  is  %.”. 

Fixed  “The  rate  is  5%.”. 

NoVerb  -  Sentence  lacking  a  verb. 

Erroneous  “Stewart,  the  builder.” 

Fixed  “/r  is  named  for  Stewart,  the  builder.” 

Discourse/Coher  ence-speciflc  Errors 

Ante  -  The  antecedent  of  an  anaphor  has  been  dropped,  resulting  in  incoherence. 

Erroneous  “Terms  are  subject  to  change,  the  company  said.” 

Fixed  “Terms  are  subject  to  change,  Banko  Exterior  said.” 

CUE+  -  Uninterpretable  cue-word.  Eor  instance,  if  Di  and  D2  form  a  discourse  constituent  and  D2  is  contrasting  Di  and  D2 
begins  with  “But”,  but  Di  is  dropped,  so  should  be  “But.” 

Erroneous  “But  the  proposed  transaction  calls  for  an  exchange  of  the  debt ...” 

Eixed  “The  proposed  transaction  calls  for  an  exchange  of  the  debt ...” 

Cue-  -  A  cue  word  or  phrase  is  missing,  rendering  the  output  incoherent. 

Erroneous  “The  President  of  the  United  States  urged  the  armed  forces  to  advance.  His  commanders  did  not  have 
the  initiative.” 

Eixed  “President  of  the  United  States  urged  the  armed  forces  to  advance.  When  he  did,  his  commanders  did 
not  have  the  initiative.” 

Summarization-specific  Errors 

Mod  -  A  nominal  modifier  which  should  not  have  been  dropped  has  been  and  significant  meaning  is  lost. 

Erroneous  “Tons  will  fill  damp  bams  across  the  land.” 

Eixed  “Tons  of  vegetables  . .  .  will  fill  damp  bams  across  the  land.” 

Miss  -  The  compression  misses  important  information. 

Extra  -  The  compression  contains  unimportant  information. 


Table  1 :  Types  of  errors  in  the  texts  generated  by  the  model. 


This  is  far  from  an  ideal  language  model.  For  in- 
stanee,  there  is  no  guarantee  that  the  strueture  the 
parser  will  derive  for  the  sentenee  will  be  the  same 
strueture  the  eompression  model  generated.  Further¬ 
more,  sinee  the  parser  assumes  only  one  input  sen¬ 
tenee,  the  seores  produeed  by  the  best  parse  for  two 
different  sentenees  may  not  be  eomparable. 

2.4  Evaluation 

After  reranking,  we  performed  the  same  error  anal¬ 
ysis  as  before.  The  results  for  the  new  error  analysis 
are  summarized  in  Table  3. 

We  ean  see  from  the  delta  row  in  Table  3  (negative 


numbers  are  good),  that  we  removed  most  grammat- 
ieality  errors.  We  saw  modest  improvement  with  re- 
speet  to  the  dropping  of  important  modifiers;  this  is 
to  be  expeeted,  though,  sinee  the  syntax  model  has 
no  idea  of  “importanee.”  The  same  argument  ex¬ 
plains  the  minimal  ehange  in  the  missing  anteeedent 
problems.  We  removed  all  instanees  of  extra  infor¬ 
mation  but  added  seven  additional  eounts  of  missing 
information. 

Any  summarization  system  must  balanee  length 
of  summary  against  informational  and  grammati- 
eal  quality.  In  order  to  have  more  doeuments  of 
higher  grammatieality,  more  words  are  often  nee- 


Table  2:  Tabulation  of  the  errors  in  the  texts  generated  by  the  model. 


wsj_0607 

wsj_0616 

wsj_0632 

wsj_0654 

wsj_0655 

wsj_0667 

wsj_0689 

wsj_1126 

wsj_1146 

wsj_1189 

wsj_1307 

wsj_1331 

wsj_1346 

wsj_1376 

wsj_1380 

wsj_2386 


rm5-10 

rm5-22 

rm5-27 

rm5-6 

rm5-9 


Count 


Delta 


Grammaticality 

Det  I  Mod  I  Comp  I  Num  I  NoVerb 


0  0 


-2  -2 


Table  3:  Tabulation  of  the  errors  in  the  texts  generated  by  the  model  with  syntax-based  reseoring. 


essary,  whieh  reduees  the  amount  of  information 
whieh  ean  be  paeked  into  a  summary  of  eomparable 
length.  Here,  by  removing  most  of  the  grammatieal- 
ity  errors,  we  eaused  the  system  to  drop  some  of  the 
important  information. 

To  determine  whether  the  ehanges  in  the  system 


are  notieeable  to  a  user,  we  earried  out  a  subjee- 
tive  evaluation.  We  presented  8  human  judges  with 
the  outputs  generated  by  the  original  text  eompres- 
sion  system,  the  results  after  reseoring,  and  human¬ 
generated  compressions.  These  judges  were  asked 
to  rank  outputs  on  a  scale  from  1  to  5  (5  being  the 


WSJ  Texts 

Cmp  Grm  Coh  Qual 

Mitre  Texts 

Cmp  Grm  Coh  Qual 

Old 

Rescored 

Hand 

0.47  3.11  2.98  2.55 
0.42  3.41  3.11  2.64 
0.59  4.38  4.33  3.97 

0.47  3.57  2.90  2.80 
0.30  3.00  3.05  2.23 
0.46  4.70  4.45  4.10 

Table  4:  Evaluation  Results 


best)  on  metries  of  eompression  rate  (Cmp),  Gram- 
matieality  (Grm),  Coherenee  (Cob)  and  Compres¬ 
sion  Quality  (Qual).  The  results  of  this  evaluation 
are  summarized  in  Table  4. 

In  the  Wall  Street  Journal  data,  there  was  a  mod¬ 
erate  improvement  in  grammatieality,  eoherenee  and 
quality,  as  error  analysis  suggested.  In  the  Mitre 
data,  grammatieality  and  quality  went  down  signif- 
ieantly,  while  eoherenee  remained  steady.  This  ean 
be  attributed  to  two  faetors.  First,  there  were  few 
errors  in  the  Mitre  data  to  start  with  and  thus  less 
room  for  improvement;  Seeond,  the  Mitre  data  is  out 
of  domain  for  both  the  doeument  eompression  sys¬ 
tem  and  for  the  parser,  whieh  leads  to  less  reliable 
statisties. 

3  Machine  Translation 

For  our  maehine  translation  experiments,  we  use  the 
statistieal  MT  system  of  Yamada  and  Knight  (2001). 
This  system  produees  English  translations  of  foreign 
language  sentenees  by  exploiting  three  eomponents: 

•  A  Translation  model  (TM).  For  any  given  pair 
(English  parse  tree  e,  foreign  language  string 
/),  this  model  returns  a  probability  P{f\e). 

•  A  Fanguage  model  (EM).  For  any  English  tree 
e,  this  model  returns  a  probability  P{e). 

•  A  Seareh  algorithm.  Given  a  foreign  language 
sentenee  /,  this  algorithm  searehes  for  the  En¬ 
glish  tree  e  that  maximizes  P{e\f)  P{e)  • 

nm- 

Yamada  and  Knight  (2001;  2002)  deseribe  the 
TM  and  the  seareh  algorithm,  respeetively.  The 
seareh  algorithm  takes  a  foreign  language  sentenee 
and  produees  a  vast  number  of  eandidate  English 
trees,  paeked  into  a  forest  strueture,  as  in  sum¬ 
marization  (see  Seetion  2),  then  searehes  for  the 
highest-seoring  tree.  The  eurrent  algorithm  uses  a 
trigram  EM,  ignoring  the  internal  strueture  of  the 


eandidate  trees.  Therefore,  the  system  does  not  nee- 
essarily  produee  syntaetieally  eorreet  translations. 

To  see  the  effeet  of  a  lexiealized  syntax  language 
model,  we  performed  what  automatie  speeeh  reeog- 
nition  (ASR)  researehers  eah  “a  eheating  experi¬ 
ment”.  For  a  given  aeoustie  signal,  ASR  researehers 
know  both  the  eorreet  target  transeription.  A,  (whieh 
was  done  by  hand)  and  the  eurrent  automatie  system 
transeription,  B.  The  probabilistie  seore  for  B  will  be 
greater  than  that  for  A.  However,  a  new  knowledge 
souree  may  provide  additional  seores  that  eause  A  to 
be  reranked  higher.  This  is  eheating  for  two  reasons: 
(1)  it  does  not  inelude  a  seareh  algorithm  that  inte¬ 
grates  the  new  knowledge  souree,  and  (2)  there  may 
be  an  ineorreet  string  C  that  seores  higher  than  both 
A  and  B  under  reranking. 

This  kind  of  experiment  is  not  regularly  done  in 
maehine  translation  beeause  there  is  no  single  eor¬ 
reet  translation  A.  Using  a  human  translation  as  a 
target  A  does  not  work,  beeause  human  translations 
are  often  non-literal  and  eurrent  statistieal  models  do 
not  reeognize  them  as  good  translations.  To  rem¬ 
edy  this,  we  manually  ereated  a  set  of  target  trans¬ 
lations,  whieh  we  ealled  “hope”  translations.  These 
are  good  translations  that  we  believe  to  be  within 
reaeh  of  the  system:  we  ean  reasonably  hope  that 
the  system  would  prefer  them. 

In  our  experiment,  we  seore  both  hope  sentenees, 
A,  and  eurrent  system  translations,  B,  over  a  num¬ 
ber  of  examples,  using  eombinations  of  these  knowl¬ 
edge  sourees: 

•  T:  translation  model 

•  R:  word-trigram  language  model 

•  C:  seore  from  Charniak’s  parser 

Table  5  shows  the  results.  A  sentenee  marked 
“deel”  is  a  deeoder  output  (Sentenee  B)  and  “hope” 
is  a  hope  sentenee  (Sentenee  A);  lower  seores  are 
better.  The  last  row  shows  the  average  differenee  of 
the  seore  (a  positive  differenee  means  that  the  sys¬ 
tem  prefers  “hope”  sentenees  over  eurrent  system 
outputs).  Just  above  the  last  row  is  the  number  of 
sentenees  whieh  ranked  better.^ 

^The  score  from  Chamiak’s  parser  (C)  is  a 
— io(7(un-normalized  prob),  thus  it  may  yield  negative 
value.  The  TM  scores  are  calculated  from  parse  trees,  not  from 


decl  great  attention  his  first  visit  to  china 

hope  his  first  visit  is  highly  important  to  the  Chinese  side 


Table  5 :  Results  of  the  experiment  on  the  MT  system 


(H37  :ADJUNCT  "earlier" 

: LOGICAL-SUBJECT  (H5  /  "company") 

/  "announce" 

:ADJUNCT  (H34  :ADJUNCT  "its" 

/  "plan")) 

Figure  1 :  An  underspecified  input  for  the  sentence, 
’’Earlier  the  company  announced  its  plans.” 

As  expected,  our  current  system  (T+R)  almost 
never  ranks  the  hope  sentence  higher  than  the  system 
sentence.  If  we  replace  the  trigram  model  with  Char- 
niak’s  parser  (T+C),  the  hope  sentence  is  ranked 
higher  than  the  originally  preferred  sentences  in  9 
cases.  These  results  are  slightly  better  than  rerank¬ 
ing  with  the  parser  (T-i-R-i-C).  Best  results  are  ob¬ 
tained  by  assigning  a  higher  weight  to  the  parser 
score  relative  to  the  translation  model  (T-I-2C):  here, 
the  hope  sentence  comes  out  on  top  in  12  cases. 

4  Natural  Language  Generation 

From  the  sentences  in  section  23  of  the  Penn 
Treebank,  inputs  to  the  HALogen  generator  sys¬ 
tem  were  automatically  derived  and  then  re- 
generated(Langkilde-Geary,  2002).  The  inputs 
derived  were  feature-value  dependency  structures, 
where  the  features  represent  syntactic  relationships 
between  values,  and  the  values  were  either  words  in 
root  form  or  a  nested  feature-value  structure.  The 
inputs  were  underspecified  with  respect  to  proper¬ 
ties  such  as  part-of-speech  category,  tense,  voice, 
and  number,  as  well  as  constituent  order  and  some 
closed-class  words  like  auxiliary  verbs  and  deter¬ 
miners.  Underspecification  tests  the  ability  of  a  lan¬ 
guage  model  to  pick  the  best  solution  from  among  a 
set  of  choices. 

HALogen  overgenerates  possible  expressions  for 
an  input,  in  part  because  of  enumerating  possible 
choices  for  underspecified  details.  It  then  ranks  po¬ 
tential  outputs  using  an  n-gram  model.  Both  bigram 
and  trigram  models  were  available,  but  because  the 
trigram  model  takes  two  orders  of  magnitude  more 
time  to  use  (roughly  40  minutes  per  sentence,  ver¬ 
sus  1  minute  per  sentence  for  the  bigram  model),  the 
sentences  were  generated  using  the  bigram  model. 

sentences.  For  “decl”  sentences,  we  use  the  parse  tree  returned 
from  the  decoder  to  calculate  it.  For  “hope”  sentences,  we  use 
a  parse  tree  generated  from  Collins  parser  (1997)  to  calculate 
it.  Thus,  there  may  be  different  T  scores  for  the  same  sentence. 


One  hundred  sentences  were  randomly  chosen  from 
among  the  successfully  generated  outputs.  Each  was 
paired  with  its  original  Treebank  sentence,  and  then 
trigram  and  Charniak-parser  scores  were  calculated 
for  all  the  sentences.  About  3%  of  the  sentence  pairs 
were  exact  matches.  Sentences  within  pairs  had  very 
similar  lengths  in  all  cases.  The  average  sentence 
length  was  24  tokens.  The  original  Treebank  sen¬ 
tence  serves  as  a  gold  standard  for  the  generator. 

Trigram  scores  for  original  Treebank  sentences 
scored  better  than  the  output  of  the  generator  sys¬ 
tem  71%  of  the  time.  In  comparison,  Charniak- 
parse  scores  preferred  the  original  Treebank  sen¬ 
tences  83%  of  the  time.  This  indicates  that  the  gen¬ 
erator  system  would  benefit  even  more  from  a  statis¬ 
tical  model  of  syntax  than  from  trigrams. 

5  Conclusion 

In  statistical  summarization,  we  have  shown  that 
having  a  lexicalized  syntax  model  reduces  the  fre¬ 
quency  of  grammatical  errors  through  both  a  care¬ 
ful  error  analysis  and  a  human  evaluation.  We 
furthermore  show  that  a  lexicalized  syntax  model 
might  assist  in  the  selection  of  good  translations  in  a 
syntax-based  machine  translation  system.  Finally, 
we  present  results  that  indicate  the  importance  of 
lexicalized  models  of  syntax  in  the  HALogen  nat¬ 
ural  language  generation  system  (Langkilde-Geary, 
2002). 

It  seems  from  these  results  that  the  integration 
of  such  a  language  model  in  these  tasks  and  in 
statistical  natural  language  generation  systems  will 
prove  to  be  fruitful.  Most  importantly,  all  three 
of  these  systems  use  the  same  forest  ranking  algo¬ 
rithm/component.  Thus,  if  this  one  component  were 
extended  to  use  a  lexicalized  model  of  syntax  in 
place  of  the  current  n-gram  scoring  method,  the  per¬ 
formance  of  all  three  system  would  undoubtedly  im¬ 
prove  significantly. 
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